Correcting Semantic Collocation Errors with L1-induced Paraphrases
نویسندگان
چکیده
We present a novel approach for automatic collocation error correction in learner English which is based on paraphrases extracted from parallel corpora. Our key assumption is that collocation errors are often caused by semantic similarity in the first language (L1language) of the writer. An analysis of a large corpus of annotated learner English confirms this assumption. We evaluate our approach on real-world learner data and show that L1-induced paraphrases outperform traditional approaches based on edit distance, homophones, and WordNet synonyms.
منابع مشابه
Evaluation on Second Language Collocational Congruency with Computational Semantic Similarity
Collocation learning is one of the important building blocks for the development of language competence. Remarkably, it is influenced by L1 and L2 congruency. The present study thus focused on the distinguishability of the computational similarity values between L2 collocates and L1 counterparts to establish the use of semantic similarity measure as a research instrument. The results showed tha...
متن کاملL1 Transfer in L2 Acquisition of the There-Insertion Construction by Mandarin EFL Learners
This study examined the role of the native language (L1) transfer in a non-native language (L2) acquisition of the there-insertion construction at the syntax-semantics interface. Specifically, the study investigated if Mandarin EFL learners would make overgeneralization errors in the situation where an L1 argument structure constitutes a superset of its L2 counterpart. Verbs of existence and ap...
متن کاملAutomatic Generation of Syntactically Well-formed and Semantically Appropriate Paraphrases
Paraphrases of an expression are alternative linguistic expressions conveying the same information as the original. Technology for handling paraphrases has been attracting increasing attention due to its potential in a wide range of natural language processing applications; e.g., machine translation, information retrieval, question answering, summarization, authoring and revision support, and r...
متن کاملUsing Paraphrases of Deep Semantic Representions to Support Regression Testing in Spoken Dialogue Systems
Rule-based spoken dialogue systems require a good regression testing framework if they are to be maintainable. We argue that there is a tension between two extreme positions when constructing the database of test examples. On the one hand, if the examples consist of input/output tuples representing many levels of internal processing, they are finegrained enough to catch most processing errors, ...
متن کاملCompounds and Productivity in Advanced L2 German Writing: A Constructional Approach
The frequent formation of complex, hierarchically structured compounds is a striking property of German grammar to non-natives. This article asks how compounding works in second language (L2) German grammar, by exploring data from the error-annotated Falko corpus of native and advanced non-native German writing. Beyond differences in overall frequency and productivity of L2 compounding, I use a...
متن کامل